[WIP, RFC] doc/memos: Added RDM on high level timer API requirements and common features #12970

maribu · 2019-12-16T21:45:19Z

Contribution description

Early draft for an RDM on a high level timer API.

Testing procedure

n/a

Issues/PRs references

None

maribu · 2019-12-16T21:50:17Z

I think I now requested reviews from every maintainer that contributed to either xtimer or ztimer and every reviewer of ztimer.

miri64 · 2019-12-17T08:15:53Z

If it is a draft, please keep to the naming scheme for drafts:

Initially, a proposed RDM is identified by its temporary file name of shape rdm-draft--.md (for example: rdm-draft-baccelli-rdm-format.md) instead of a number assigned at a later stage. At this stage, the RDM's header indactes status "draft".

maribu · 2019-12-17T08:52:42Z

If it is a draft, please keep to the naming scheme for drafts:

Done

doc/memos/rdm-draft-buschsieweke-timer-api.md

kaspar030 · 2019-12-17T09:04:18Z

doc/memos/rdm-draft-buschsieweke-timer-api.md

+- The implementation must only disable IRQs for short periods of time
+- The implementation must cause as little IRQs as possible and its ISRs must
+  be as short as possible
+- The implementation should provide means to execute callbacks in thread


this is IMO out of scope

... or should maybe go to the utility functionality...

or should maybe go to the utility functionality

Yep. It is already in the feature requirements below utilities. To me, this is an important thing to improve real time capabilities in a convenient way. (Obviously, one could just write their own handlers to call event_post(); but I think having a utility function for that can be much easier to use.)

I'm not sure if it makes sense to have this written twice. To me this is both a feature requirement and a realtime requirement; so I just added it twice.

doc/memos/rdm-draft-buschsieweke-timer-api.md

kaspar030 · 2019-12-17T09:13:07Z

doc/memos/rdm-draft-buschsieweke-timer-api.md

+
+## Precision Requirements
+
+- Timer callbacks should never fire prematurely (+- 1 tick of the underlying


Important point. Is "-1" acceptable?

With that formulation, clearly yes. I think the effort needed to prevent this could be quite significant. So if no one cares about it triggering early by 1 tick enough to speak up, we should just allow this :-)

I think the effort needed to prevent this could be quite significant.

set(val) { val += 1; ... }, or run_callback() { while(now() < target_time) {}; ... can achieve this.

We had a discussion some time ago, where people convinced me that "at least as" semantics are the only sane ones. Don't care about one tick? Set the timeout one less. (Having written that, I realize it also works the other way. :) )

The impact of this is very different depending on us, ms, or even second ticks are used.

I'm OK with keeping -1 for now.

doc/memos/rdm-draft-buschsieweke-timer-api.md

maribu · 2020-09-21T09:45:04Z

I updated the title to contain both the survey and the requirements analysis and also added every contributor as author (sorted by last name, alphabetically).

emmanuelsearch · 2020-09-21T10:29:58Z

I agree with @aabadie here. From my point of view, the most crucial part of this document for our next steps is the requirements we set for our high-level timer API. Would be bizarre not to put this in the title of the RDM.

tcschmidt · 2020-09-21T11:45:51Z

doc/memos/rdm-draft-buschsieweke-timer-api.md

+while not being to strict to rule out reasonable trade-offs e.g. between
+ROM/RAM requirements and latency. An *O(n)* latency is e.g. feasible to
+implement with a sorted linked list and therefore a reasonable lower bar.
+


O(n) is an unbound function, so this justification is inconsistent. For real-time requirements ask for O(1), I suppose.

For every concrete application, there is an upper bound for the number n of software timers being active at the same time. So for every concrete application, O(n) has an upper bound.

This is true for any finite set of numbers. So we could also go for an O(exp(n))?

O(n) is an unbound function, so this justification is inconsistent. For real-time requirements ask for O(1), I suppose.

This is intentionally written to not be O(1), because of the reasonable trade-offs as described. We know how to do O(n) by simply using linked lists and sorting on insert, and we understand the runtime implications well. We don't know how to do this in O(1) without limiting the number of allowed timers.

We do multiplexing here, so one hardware timer (or more, but a small number) needs to be used to schedule N timeouts.
So when adding one of these timeouts (using a set(interval) function), that new timeout needs to be added into the list of already queued timers. Using linked lists, this is an O(n) operation. Using an array and binary search, this is O(logn). We need to be careful what to require here, because if we break O(n*logn) for sorting n timers, we might attract a lot of attention.

It is possible to add an allowed timers limit (that is lower than available memory, e.g., 16 or 128) and enforce this.
We'd suddenly have O(1) on paper (meeting the requirements), but wouldn't have gained anything, other than that now a multitude of functions have an error case (timer_sleep() might fail with "maximum number of timers reached") that needs to be checked everywhere, which is unreasonable.

TL;DR let's not require something we don't know how to implement. This is phrased as "yadda yadda reasonable trade-offs".

The complete timer management can be done in O(log(n)), which is almost constant in the number range we consider realistically. Further optimizations can apply, if algorithms account for the deadline of the next firing timer, which can be determined in constant time.
Current preliminary HiL-tests of @pokgak show a rather poor behavior of xtimer and a somewhat unclear signature of ztimer. We will publish once ready.
In any case, it is important that timer firing remains within predictable bounds independent of the number of instantiated timers - otherwise real-time gets out of control.

log(25) ~ 1.39, exp(25) ~ 7.2 x 10**10.

log(25) ~ 1.39, exp(25) ~ 7.2 x 10**10.

Both bounded, which is exactly what a realtime system needs.

Really?

Do you want to dive into the numerical analysis of the numbers you provided or can we agree here that both numbers are constants?

A realtime system does not require the system to be fast. It does however require the system to respond within the specified deadlines. This deadline can be in the order of microsecond or in the order of years, depending on the requirements.

Furthermore, a complex O(1) algorithm, taking 10 seconds performs worse compared to a simple O(n) algorithm taking 10 ms + n * 1 ms. when n is known to be bounded to 100.

As a solution here, why don't we rephrase a bit and specify that it must be trivial to determine the maximum number of active timers for an application in order to ensure bounded execution time?

kaspar030 · 2020-09-24T09:29:54Z

@MichelRottleuthner do you want to add requirements, or are you fine with this for now?
@tcschmidt are you OK with the title now?

jue89

I just had a deeper look into this RDM. Its been a while since the last time I did that.

All in all, this RDM covers all features we require for our use cases. A nice piece of work!

Just a minor comment. (Sorry for kicking in this late during the polishing process!)

doc/memos/rdm-draft-buschsieweke-timer-api.md

fjmolinas · 2020-10-22T09:35:46Z

ping @maribu!

kaspar030 · 2020-10-22T10:28:33Z

This PR started out to be the RDM on timers, so we could base any further work on it with confidence. But currently it is a collection of requirements and some survey on features other timer implementations provide. It does not go into specific design questions for a high-level timer implementation.

At the same time, ztimer got merged and provides many solutions to problems we have with xtimer. But migrating to ztimer as the default timer is a bit stalled for lack of an RDM approved design, which this RDM was supposed to give, but doesn't.

I think we all agree that doing comprehensive research on "how to do high-level timers right[tm]" is a lot of work. Many of us think that ztimer is a step in the right direction, and we should migrate. AFAIK noone thinks xtimer can be turned into whatever we need without substantial rework. And xtimer effectively blocks RIOT from sleeping, which should be a fix first huuuuge bug for an OS supposed to run on low-power devices.

Can we agree on most of the requirements, and concentrate on that for this PR?

We could then move forward with ztimer and check if it meets this PR's requirements, in a follow-up RDM.
Also, work on a proper "how to do high-level timers right[tm]" RDM can start.

(would it make sense to even drop the survey part here and save it for the more extensive research RDM?)

benpicco · 2020-10-22T10:32:41Z

But migrating to ztimer as the default timer is a bit stalled for lack of an RDM approved design, which this RDM was supposed to give, but doesn't.

I don't think the lack of RDM stops ztimer from being used by default.
AFAIR there are still features missing from ztimer (ztimer_mutex_lock_timeout()) that prevent it from replacing xtimer entirely.

tcschmidt · 2020-10-23T23:29:02Z

Even though I believe a rigorous and exhaustive design document would be essential for reworking the timer API, I don't see any of this here. It rather appears as an - somewhat related - informational document about "some thoughts about timers without rigorous justification". This certainly cannot serve as a foundation for redesign. The question is rather, whether this is helpful and needed or whether this can simply be discarded.
Any opinions?

maribu · 2020-10-25T11:24:03Z

Let me recapitulate the history of this:

We have with xtimer a high level timer API that is known for years to not be insufficient for any low power scenario
With ztimer an alternative was proposed that is known to be a huge step in the right direction
The transition to ztimer was blocked with the argument that an RDM is needed first
An RDM was started to address this
The goals of the RDM were scaled down to reflect the lack of progress and engagement as well as the inability to reach a consensus on anything more ambitions than a list of basic requirements
- The small list of features and superficial implementation details of timer APIs of competing OS could also stay (at least so far) - good to know that at least lists unopinionated facts don't spawn endless discussions
Even for this unambitious RDM it seems to be difficult to reach a consensus - despite that this list of requirements is relatively unopinionated due to a lack of prioritization of the requirements most likely to end up on opposing sides of trade-offs
Just when it seems that finally an agreement on this least ambitions outcome of the RDM is in reach - people start to call for burying the RDM
Meanwhile: RIOT remains completely unsuitable for any low power scenarios in default configuration - despite a solid alternative that performs well in low power scenarios is already in the code base

An particularly interesting observation is that the very persons that were the first to call out for an RDM as requirement for any progress in regard to the timer API are seemingly the last persons to contribute to it.

I'm willing to keep pushing this RDM to some sort of conclusion. But please keep me out of the loop for any follow up.

benpicco · 2020-10-25T12:25:14Z

The transition to ztimer was blocked with the argument that an RDM is needed first

Is that really the case? If ztimer works and is a drop in replacement for xtimer, nobody will care about the RDM anymore.
With the 802.15.4 radio HAL the RDM is used to discuss how things should be implemented, then implement them. It's a living document that aids development.
Having an RDM for the sake of having an RDM is pointless.

What really blocks transition to ztimer is that not all features of xtimer are covered yet.
Once it's a drop in replacement we can just replace xtimer with ztimer_xtimer_compat.

maribu · 2020-10-25T16:43:48Z

The transition to ztimer was blocked with the argument that an RDM is needed first

Is that really the case?

From the discussion in the devel mailing list I pretty much got the impression. Here are some quotes of the key arguments leading to this RDM:

Side note: I think we should move the discussion to an RDM PR for the
design document and discuss there to not get lost.

@MichelRottleuthner, 10. 12. 2019, full email for context

What do you think about an RDM PR?

@MichelRottleuthner, 11. 12. 2019, full email for context

I'd favor to not merge it before we worked on the RDM as it may show
that there are things that should be done differently.

[...]

I'll provide the collected information to the RDM.

[...]

No, I mean if "having a non functional timer for many years" happens again.
I think the way how xtimer did is job over these years is not something
we want to repeat.

So an RDM would have prevented this? I doubt it. xtimer's ISR buggyness
was known for years even without an RDM. We were just inexperienced
(or ignorant, or incompetent, your pick).

Yes I honestly think it could have helped to on that end to thoroughly
discuss the key problems with a broader audience of interested and
competent people.

@MichelRottleuthner, 13. 12. 2019, full email for context

@tcschmidt did not explicitly ask for an RDM, but for a "problem statement and design document" and wanted "a clear and falsifiable problem statement" prior to even discussing the adoption of ztimer, see full email.

To my best understanding, both @MichelRottleuthner and @tcschmidt saw completion of an RDM (or in case of @tcschmidt an RDM-like document) as requirement for adoption of ztimer. So I started this off:

in order to push this forward I just opened a PR for an RDM at
#12970. This PR is in an early state and
both feedback and help will be greatly appreciated.

@maribu, 16. 12. 2019, full email for context

I think it is fair to say that nobody's expectations towards this RDM were met. And honestly, I see no indication that a second try should work out better.

I'm not sure if what @benpicco was suggesting to rather work on improving ztimer and fill the gaps rather than endlessly stalling progress because of RDMs that don't get anywhere. But if so: strong +1

Btw: #15052 would be a huge step towards ztimer_mutex_lock_timeout().

waehlisch · 2020-10-25T17:05:22Z

The transition to ztimer was blocked with the argument that an RDM is needed first

Is that really the case?

several people asked for an RDM to have a base for an informed discussion. it was not about "Having an RDM for the sake of having an RDM".

quite often sound solutions need informed discussions, to prevent trial and error. ... as someone, i think it was Kaspar, said in the timer breakout session in Helsinki: why should we be successful this time? we know from the past that code that is not enough.

emmanuelsearch · 2020-10-26T10:45:50Z

I think it is fair to say that nobody's expectations towards this RDM were met. And honestly, I see no indication that a second try should work out better.

@maribu thanks a lot for the work you put in this.

Maybe we are falsely lead to see this glass half empty.
On the positive, half-full side:

we have collaboratively gathered a comprehensive list of requirements;
we have a solution designed and partly implemented -- but which already seems to improve the status quo;
some already use this solution in production.

Let's not lose sight of the above, which can be considered useful progress.
In that respect, I consider this RDM already useful -- independent of its stalling for other reasons.

benpicco · 2020-11-09T16:47:18Z

As this seems to be the place to put general ideas about the high-level timer API, here is another one:

I think the high-level timer could be simplified if it configured the low-level timer in down-counting mode.
If we want to set a timeout in n ticks, right now we have to

get the current time
add n, make sure we did not overshoot already

instead, with down-counting timers we only have to

set timeout to n

Now this simplifies the set_timeout() function at the expense of the get_time() function - however, I think the former is executed a lot more often.

What do you think?

tcschmidt · 2020-11-10T23:32:04Z

@benpicco Interesting perspective. I suppose it needs experimental validation, though.
Also, low-level timers should see more requirements or desires from the high-level side: A lot more interrelation with #15046 would clearly progress insights w.r.t. the real problems of timers.

stale · 2021-06-02T16:36:56Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If you want me to ignore this issue, please mark it with the "State: don't stale" label. Thank you for your contributions.

stale · 2022-03-02T09:18:59Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If you want me to ignore this issue, please mark it with the "State: don't stale" label. Thank you for your contributions.

maribu requested review from jnohlgard, kaspar030, MichelRottleuthner, haukepetersen, OlegHahm, aabadie, PeterKietzmann, smlng, vincent-d, miri64, SemjonWilke and bergzand December 16, 2019 21:45

doc/memos: Added draft RDM on high level timer API

d53b2fc

maribu force-pushed the timer-rdm branch from 432d0c9 to d53b2fc Compare December 17, 2019 08:50

kaspar030 reviewed Dec 17, 2019

View reviewed changes

doc/memos/rdm-draft-buschsieweke-timer-api.md Show resolved Hide resolved

kaspar030 reviewed Dec 17, 2019

View reviewed changes

doc/memos/rdm-draft-buschsieweke-timer-api.md Outdated Show resolved Hide resolved

kaspar030 reviewed Dec 17, 2019

View reviewed changes

doc/memos/rdm-draft-buschsieweke-timer-api.md Outdated Show resolved Hide resolved

kaspar030 reviewed Dec 17, 2019

View reviewed changes

doc/memos/rdm-draft-buschsieweke-timer-api.md Show resolved Hide resolved

miri64 changed the title ~~[WIP, RFC] doc/memos: Added RDM0002 on high level timer API~~ [WIP, RFC] doc/memos: Added RDM on high level timer API Dec 17, 2019

kaspar030 reviewed Dec 17, 2019

View reviewed changes

doc/memos/rdm-draft-buschsieweke-timer-api.md Outdated Show resolved Hide resolved

kaspar030 reviewed Dec 17, 2019

View reviewed changes

doc/memos/rdm-draft-buschsieweke-timer-api.md Show resolved Hide resolved

fixup! doc/memos: Added draft RDM on high level timer API

6e30f5c

maribu changed the title ~~[WIP, RFC] doc/memos: Added RDM on high level timer API~~ [WIP, RFC] doc/memos: Added RDM on high level timer API requirements and common features Sep 21, 2020

tcschmidt reviewed Sep 21, 2020

View reviewed changes

jue89 requested changes Sep 25, 2020

View reviewed changes

doc/memos/rdm-draft-buschsieweke-timer-api.md Show resolved Hide resolved

stale bot added the State: stale State: The issue / PR has no activity for >185 days label Jun 2, 2021

SemjonWilke removed their request for review June 4, 2021 11:22

stale bot removed the State: stale State: The issue / PR has no activity for >185 days label Jun 4, 2021

fjmolinas added this to To do in Ztimer Jun 11, 2021

MrKevinWeiss added this to the Release 2021.07 milestone Jun 22, 2021

MrKevinWeiss removed this from the Release 2021.07 milestone Jul 15, 2021

stale bot added the State: stale State: The issue / PR has no activity for >185 days label Mar 2, 2022

stale bot closed this Apr 17, 2022

Ztimer automation moved this from To do to Done Apr 17, 2022

maribu deleted the timer-rdm branch January 21, 2023 21:41


		## Precision Requirements

		- Timer callbacks should never fire prematurely (+- 1 tick of the underlying

[WIP, RFC] doc/memos: Added RDM on high level timer API requirements and common features #12970

[WIP, RFC] doc/memos: Added RDM on high level timer API requirements and common features #12970

Conversation

maribu commented Dec 16, 2019 • edited

Contribution description

Testing procedure

Issues/PRs references

maribu commented Dec 16, 2019

miri64 commented Dec 17, 2019

maribu commented Dec 17, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maribu commented Sep 21, 2020

emmanuelsearch commented Sep 21, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bergzand Oct 25, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kaspar030 commented Sep 24, 2020

jue89 left a comment

Choose a reason for hiding this comment

fjmolinas commented Oct 22, 2020

kaspar030 commented Oct 22, 2020

benpicco commented Oct 22, 2020

tcschmidt commented Oct 23, 2020

maribu commented Oct 25, 2020

benpicco commented Oct 25, 2020

maribu commented Oct 25, 2020

waehlisch commented Oct 25, 2020

emmanuelsearch commented Oct 26, 2020

benpicco commented Nov 9, 2020

tcschmidt commented Nov 10, 2020

stale bot commented Jun 2, 2021

stale bot commented Mar 2, 2022

maribu commented Dec 16, 2019 •

edited

bergzand Oct 25, 2020 •

edited